Goto

Collaborating Authors

 genetic effect


A weighted U statistic for association analysis considering genetic heterogeneity

Wei, Changshuai, Elston, Robert C., Lu, Qing

arXiv.org Artificial Intelligence

Converging evidence suggests that common complex diseases with the same or similar clinical manifestations could have different underlying genetic etiologies. While current research interests have shifted toward uncovering rare variants and structural variations predisposing to human diseases, the impact of heterogeneity in genetic studies of complex diseases has been largely overlooked. Most of the existing statistical methods assume the disease under investigation has a homogeneous genetic effect and could, therefore, have low power if the disease undergoes heterogeneous pathophysiological and etiological processes. In this paper, we propose a heterogeneity weighted U (HWU) method for association analyses considering genetic heterogeneity. HWU can be applied to various types of phenotypes (e.g., binary and continuous) and is computationally effcient for high- dimensional genetic data. Through simulations, we showed the advantage of HWU when the underlying genetic etiology of a disease was heterogeneous, as well as the robustness of HWU against different model assumptions (e.g., phenotype distributions). Using HWU, we conducted a genome-wide analysis of nicotine dependence from the Study of Addiction: Genetics and Environments (SAGE) dataset. The genome-wide analysis of nearly one million genetic markers took 7 hours, identifying heterogeneous effects of two new genes (i.e., CYP3A5 and IKBKB) on nicotine dependence.


An Association Test Based on Kernel-Based Neural Networks for Complex Genetic Association Analysis

Hou, Tingting, Jiang, Chang, Lu, Qing

arXiv.org Artificial Intelligence

The advent of artificial intelligence, especially the progress of deep neural networks, is expected to revolutionize genetic research and offer unprecedented potential to decode the complex relationships between genetic variants and disease phenotypes, which could mark a significant step toward improving our understanding of the disease etiology. While deep neural networks hold great promise for genetic association analysis, limited research has been focused on developing neural-network-based tests to dissect complex genotype-phenotype associations. This complexity arises from the opaque nature of neural networks and the absence of defined limiting distributions. We have previously developed a kernel-based neural network model (KNN) that synergizes the strengths of linear mixed models with conventional neural networks. KNN adopts a computationally efficient minimum norm quadratic unbiased estimator (MINQUE) algorithm and uses KNN structure to capture the complex relationship between large-scale sequencing data and a disease phenotype of interest. In the KNN framework, we introduce a MINQUE-based test to assess the joint association of genetic variants with the phenotype, which considers non-linear and non-additive effects and follows a mixture of chi-square distributions. We also construct two additional tests to evaluate and interpret linear and non-linear/non-additive genetic effects, including interaction effects. Our simulations show that our method consistently controls the type I error rate under various conditions and achieves greater power than a commonly used sequence kernel association test (SKAT), especially when involving non-linear and interaction effects. When applied to real data from the UK Biobank, our approach identified genes associated with hippocampal volume, which can be further replicated and evaluated for their role in the pathogenesis of Alzheimer's disease.


A Kernel-Based Neural Network Test for High-dimensional Sequencing Data Analysis

Hou, Tingting, Jiang, Chang, Lu, Qing

arXiv.org Machine Learning

The recent development of artificial intelligence (AI) technology, especially the advance of deep neural network (DNN) technology, has revolutionized many fields. While DNN plays a central role in modern AI technology, it has been rarely used in sequencing data analysis due to challenges brought by high-dimensional sequencing data (e.g., overfitting). Moreover, due to the complexity of neural networks and their unknown limiting distributions, building association tests on neural networks for genetic association analysis remains a great challenge. To address these challenges and fill the important gap of using AI in high-dimensional sequencing data analysis, we introduce a new kernel-based neural network (KNN) test for complex association analysis of sequencing data. The test is built on our previously developed KNN framework, which uses random effects to model the overall effects of high-dimensional genetic data and adopts kernel-based neural network structures to model complex genotype-phenotype relationships. Based on KNN, a Wald-type test is then introduced to evaluate the joint association of high-dimensional genetic data with a disease phenotype of interest, considering non-linear and non-additive effects (e.g., interaction effects). Through simulations, we demonstrated that our proposed method attained higher power compared to the sequence kernel association test (SKAT), especially in the presence of non-linear and interaction effects. Finally, we apply the methods to the whole genome sequencing (WGS) dataset from the Alzheimer's Disease Neuroimaging Initiative (ADNI) study, investigating new genes associated with the hippocampal volume change over time.


AI Machine Learning Predicts Alzheimer's Disease Risk

#artificialintelligence

The most common cause of dementia worldwide is Alzheimer's disease (AD), a neurodegenerative disorder with no known cure. A new study published in Scientific Reports uses artificial intelligence (AI) machine learning (ML) and data from electronic health records (EHRs) to identify the important predictors for Alzheimer's disease and finds that a person's genetics outperforms age as a predictor for individuals who are 65 years of age and older. "Machine learning (ML) methods provide an attractive and effective alternative to traditional statistical regression models, especially in situations where one has a large number of features or predictors," wrote the authors of the National Institutes of Health (NIH) funded study led by Xiaoyi Raymond Gao at The Ohio State University College of Medicine, with Ohio State researchers Marion Chiariglione, Ke Qin and Douglas Scharre; the University of Miami researchers Karen Nuytemans and Eden Martin; and Yi-Ju Li at Duke University. Globally, Alzheimer's disease accounts for an estimated 60-70 percent of the over 55 million people with dementia and affects women disproportionately according to the World Health Organization (WHO). In the U.S., there are currently 6.7 million people aged 65 and older with living AD, of which almost two-thirds are women and that figure will increase significantly to an estimated 12.7 million Americans by 2050 according to the Alzheimer's Association.


A statistical framework for GWAS of high dimensional phenotypes using summary statistics, with application to metabolite GWAS

Huang, Weiqiong, Hector, Emily C., Cape, Joshua, McKennan, Chris

arXiv.org Machine Learning

The recent explosion of genetic and high dimensional biobank and 'omic' data has provided researchers with the opportunity to investigate the shared genetic origin (pleiotropy) of hundreds to thousands of related phenotypes. However, existing methods for multi-phenotype genome-wide association studies (GWAS) do not model pleiotropy, are only applicable to a small number of phenotypes, or provide no way to perform inference. To add further complication, raw genetic and phenotype data are rarely observed, meaning analyses must be performed on GWAS summary statistics whose statistical properties in high dimensions are poorly understood. We therefore developed a novel model, theoretical framework, and set of methods to perform Bayesian inference in GWAS of high dimensional phenotypes using summary statistics that explicitly model pleiotropy, beget fast computation, and facilitate the use of biologically informed priors. We demonstrate the utility of our procedure by applying it to metabolite GWAS, where we develop new nonparametric priors for genetic effects on metabolite levels that use known metabolic pathway information and foster interpretable inference at the pathway level.


The GTEx Consortium atlas of genetic regulatory effects across human tissues

Science

The Genotype-Tissue Expression (GTEx) project was established to characterize genetic effects on the transcriptome across human tissues and to link these regulatory mechanisms to trait and disease associations. Here, we present analyses of the version 8 data, examining 15,201 RNA-sequencing samples from 49 tissues of 838 postmortem donors. We comprehensively characterize genetic associations for gene expression and splicing in cis and trans, showing that regulatory associations are found for almost all genes, and describe the underlying molecular mechanisms and their contribution to allelic heterogeneity and pleiotropy of complex traits. Leveraging the large diversity of tissues, we provide insights into the tissue specificity of genetic effects and show that cell type composition is a key factor in understanding gene regulatory mechanisms in human tissues.


'Smart genes' account for 20% of intelligence: study

The Japan Times

PARIS – Scientists on Monday announced the discovery of 52 genes linked to human intelligence, 40 of which have been identified as such for the first time. The findings also turned up a surprising connection between intelligence and autism that could one day help shed light on the condition's origins. Taken together, the new batch of "smart genes" accounted for 20 percent of the discrepancies in IQ test results among tens of thousands of people examined, the researchers reported in the journal Nature Genetics says. "For the first time, we were able to detect a substantial amount of genetic effects in IQ," said Danielle Posthuma, a researcher at the Center for Neurogenomics and Cognitive Research in Amsterdam, and the main architect of the study. "Our findings provide insight into the biological underpinnings of intelligence," she said. Most of the newly discovered gene variants linked to elevated IQ play a role in regulating cell development in the brain, especially neuron differentiation and the formation of neural information gateways called synapses.


Locally Epistatic Models for Genome-wide Prediction and Association by Importance Sampling

Akdemir, Deniz, Jannink, Jean-Luc

arXiv.org Machine Learning

In statistical genetics an important task involves building predictive models for the genotype-phenotype relationships and thus attribute a proportion of the total phenotypic variance to the variation in genotypes. Numerous models have been proposed to incorporate additive genetic effects into models for prediction or association. However, there is a scarcity of models that can adequately account for gene by gene or other forms of genetical interactions. In addition, there is an increased interest in using marker annotations in genome-wide prediction and association. In this paper, we discuss an hybrid modeling methodology which combines the parametric mixed modeling approach and the non-parametric rule ensembles. This approach gives us a flexible class of models that can be used to capture additive, locally epistatic genetic effects, gene x background interactions and allows us to incorporate one or more annotations into the genomic selection or association models. We use benchmark data sets covering a range of organisms and traits in addition to simulated data sets to illustrate the strengths of this approach. The improvement of model accuracies and association results suggest that a part of the "missing heritability" in complex traits can be captured by modeling local epistasis.


A New Statistical Framework for Genetic Pleiotropic Analysis of High Dimensional Phenotype Data

Wang, Panpan, Rahman, Mohammad, Jin, Li, Xiong, Momiao

arXiv.org Machine Learning

The widely used genetic pleiotropic analysis of multiple phenotypes are often designed for examining the relationship between common variants and a few phenotypes. They are not suited for both high dimensional phenotypes and high dimensional genotype (next-generation sequencing) data. To overcome these limitations, we develop sparse structural equation models (SEMs) as a general framework for a new paradigm of genetic analysis of multiple phenotypes. To incorporate both common and rare variants into the analysis, we extend the traditional multivariate SEMs to sparse functional SEMs. To deal with high dimensional phenotype and genotype data, we employ functional data analysis and the alternative direction methods of multiplier (ADMM) techniques to reduce data dimension and improve computational efficiency. Using large scale simulations we showed that the proposed methods have higher power to detect true causal genetic pleiotropic structure than other existing methods. Simulations also demonstrate that the gene-based pleiotropic analysis has higher power than the single variant-based pleiotropic analysis. The proposed method is applied to exome sequence data from the NHLBI Exome Sequencing Project (ESP) with 11 phenotypes, which identifies a network with 137 genes connected to 11 phenotypes and 341 edges. Among them, 114 genes showed pleiotropic genetic effects and 45 genes were reported to be associated with phenotypes in the analysis or other cardiovascular disease (CVD) related phenotypes in the literature.


Genomic Prediction of Quantitative Traits using Sparse and Locally Epistatic Models

Akdemir, Deniz

arXiv.org Machine Learning

In plant and animal breeding studies a distinction is made between the genetic value (additive + epistatic genetic effects) and the breeding value (additive genetic effects) of an individual since it is expected that some of the epistatic genetic effects will be lost due to recombination. In this paper, we argue that the breeder can take advantage of some of the epistatic marker effects in regions of low recombination. The models introduced here aim to estimate local epistatic line heritability by using the genetic map information and combine the local additive and epistatic effects. To this end, we have used semi-parametric mixed models with multiple local genomic relationship matrices with hierarchical designs and lasso post-processing for sparsity in the final model. Our models produce good predictive performance along with good explanatory information.